multi-view learning
SE(3) Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation
Incorporating inductive bias by embedding geometric entities (such as rays) as input has proven successful in multi-view learning. However, the methods adopting this technique typically lack equivariance, which is crucial for effective 3D learning. Equivariance serves as a valuable inductive prior, aiding in the generation of robust multi-view features for 3D scene understanding. In this paper, we explore the application of equivariant multi-view learning to depth estimation, not only recognizing its significance for computer vision and robotics but also addressing the limitations of previous research. Most prior studies have either overlooked equivariance in this setting or achieved only approximate equivariance through data augmentation, which often leads to inconsistencies across different reference frames. To address this issue, we propose to embed $SE(3)$ equivariance into the Perceiver IO architecture. We employ Spherical Harmonics for positional encoding to ensure 3D rotation equivariance, and develop a specialized equivariant encoder and decoder within the Perceiver IO architecture. To validate our model, we applied it to the task of stereo depth estimation, achieving state of the art results on real-world datasets without explicit geometric constraints or extensive data augmentation.
Trusted Multi-view Learning for Long-tailed Classification
Tang, Chuanqing, Shi, Yifei, Lin, Guanghao, Xing, Lei, Shi, Long
Class imbalance has been extensively studied in single-view scenarios; however, addressing this challenge in multi-view contexts remains an open problem, with even scarcer research focusing on trustworthy solutions. In this paper, we tackle a particularly challenging class imbalance problem in multi-view scenarios: long-tailed classification. We propose TMLC, a Trusted Multi-view Long-tailed Classification framework, which makes contributions on two critical aspects: opinion aggregation and pseudo-data generation. Specifically, inspired by Social Identity Theory, we design a group consensus opinion aggregation mechanism that guides decision-making toward the direction favored by the majority of the group. In terms of pseudo-data generation, we introduce a novel distance metric to adapt SMOTE for multi-view scenarios and develop an uncertainty-guided data generation module that produces high-quality pseudo-data, effectively mitigating the adverse effects of class imbalance. Extensive experiments on long-tailed multi-view datasets demonstrate that our model is capable of achieving superior performance. The code is released at https://github.com/cncq-tang/TMLC.
- Asia > China > Sichuan Province > Chengdu (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
We thank all reviewers for their valuable comments, such as the novelty, well-motivated objective and promising results
We thank all reviewers for their valuable comments, such as the novelty, well-motivated objective and promising results. The code "ICP-pytorch" has been anonymously released on GitHub. We rebut key issues point-by-point as below. We sincerely hope R#2 to raise the score. We rebut these concerns below, and will clarify related issues in our paper.
SE(3) Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation
Incorporating inductive bias by embedding geometric entities (such as rays) as input has proven successful in multi-view learning. However, the methods adopting this technique typically lack equivariance, which is crucial for effective 3D learning. Equivariance serves as a valuable inductive prior, aiding in the generation of robust multi-view features for 3D scene understanding. In this paper, we explore the application of equivariant multi-view learning to depth estimation, not only recognizing its significance for computer vision and robotics but also addressing the limitations of previous research. Most prior studies have either overlooked equivariance in this setting or achieved only approximate equivariance through data augmentation, which often leads to inconsistencies across different reference frames.
Reliable Disentanglement Multi-view Learning Against View Adversarial Attacks
Wang, Xuyang, Duan, Siyuan, Li, Qizhi, Duan, Guiduo, Sun, Yuan, Peng, Dezhong
Trustworthy multi-view learning has attracted extensive attention because evidence learning can provide reliable uncertainty estimation to enhance the credibility of multi-view predictions. Existing trusted multi-view learning methods implicitly assume that multi-view data is secure. However, in safety-sensitive applications such as autonomous driving and security monitoring, multi-view data often faces threats from adversarial perturbations, thereby deceiving or disrupting multi-view models. This inevitably leads to the adversarial unreliability problem (AUP) in trusted multi-view learning. To overcome this tricky problem, we propose a novel multi-view learning framework, namely Reliable Disentanglement Multi-view Learning (RDML). Specifically, we first propose evidential disentanglement learning to decompose each view into clean and adversarial parts under the guidance of corresponding evidences, which is extracted by a pretrained evidence extractor. Then, we employ the feature recalibration module to mitigate the negative impact of adversarial perturbations and extract potential informative features from them. Finally, to further ignore the irreparable adversarial interferences, a view-level evidential attention mechanism is designed. Extensive experiments on multi-view classification tasks with adversarial attacks show that RDML outperforms the state-of-the-art methods by a relatively large margin. Our code is available at https://github.com/Willy1005/2025-IJCAI-RDML.
Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis
Wen, Wen, Gong, Tieliang, Dong, Yuxin, Yu, Shujian, Zhang, Weizhan
In most scientific data analysis scenarios, data collected from diverse domains and different sensors exhibit heterogeneous properties while preserving underlying connections. For example, (1) a piece of text can express the same semantics and sentiment in multiple different languages; (2) the user's interest can be reflected in the text posted, images uploaded, and videos viewed; (3) animals perceive potential dangers in their surroundings through various senses such as sight, hearing, and smell. All of these reflect different perspectives of the data, collectively referred to as multi-view data. Extracting consensus and complementarity information from multiple views to achieve a comprehensive representation of multi-view data, has stimulated research interest across various fields and led to the development of multi-view learning Hamdi et al. (2021); Fan et al. (2022); Fu et al. (2022); Hong et al. (2023). While various methodologies have emerged in multi-view learning, predominantly encompassing canonical correlation analysis (CCA)-based approaches Gao et al. (2020); Chen et al. (2022); Shu et al. (2022) and engineering-driven techniques Xu et al. (2021); Bai et al. (2023), these methods suffer from a critical limitation. Specifically, their emphasis on maximizing cross-view consensus information often comes at the expense of view-specific, task-relevant information, thereby potentially compromising downstream performance Liang et al. (2024). Recent significant efforts have been dedicated to leveraging diverse information-theoretic techniques to precisely capture both view-common and view-unique components from multiple views Wang et al. (2019); Federici et al. (2020); Wang et al. (2023); Cui et al. (2024); Zhang et al. (2024), thereby yielding maximally disentangled representation and improving generalization ability. For instance, Kleinman et al. (2024) and Zhang et al. (2024) introduce the notion of Gács-Körner common information (Gács et al., 1973) and utilize total correlation between consensus and complementarity information to extract mutually independent cross-view common and unique components.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Generalized Trusted Multi-view Classification Framework with Hierarchical Opinion Aggregation
Shi, Long, Tang, Chuanqing, Deng, Huangyi, Xu, Cai, Xing, Lei, Chen, Badong
Recently, multi-view learning has witnessed a considerable interest on the research of trusted decision-making. Previous methods are mainly inspired from an important paper published by Han et al. in 2021, which formulates a Trusted Multi-view Classification (TMC) framework that aggregates evidence from different views based on Dempster's combination rule. All these methods only consider inter-view aggregation, yet lacking exploitation of intra-view information. In this paper, we propose a generalized trusted multi-view classification framework with hierarchical opinion aggregation. This hierarchical framework includes a two-phase aggregation process: the intra-view and inter-view aggregation hierarchies. In the intra aggregation, we assume that each view is comprised of common information shared with other views, as well as its specific information. We then aggregate both the common and specific information. This aggregation phase is useful to eliminate the feature noise inherent to view itself, thereby improving the view quality. In the inter-view aggregation, we design an attention mechanism at the evidence level to facilitate opinion aggregation from different views. To the best of our knowledge, this is one of the pioneering efforts to formulate a hierarchical aggregation framework in the trusted multi-view learning domain. Extensive experiments show that our model outperforms some state-of-art trust-related baselines.
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)